We’re moving from the command line to the browser; C to PHP and HTML and CSS.
pset5We’re also going to make a spell-checker for pset5. We have a choice of what data structure we use:
Ooh, pset5 is known to be the hardest pset! Good luck!
Hash functions take in data and shrinks it down with an algorithm. An example is SHA1.
A note about current events: unfortunately, researchers have found ways to reverse engineer hash functions. Now they have moved to SHA256.
Every internet-connected device has an address, used for identifying each other.
How are addresses given? A DHCP server assigns addresses whenever you connect to a wifi network. These are called IP (Internet Protocol) addresses, with the layout #.#.#.#, a 32-bit address.
Another real-world connection: we’re running out of IP addresses, and the response has been slow. Now, IPv4 is being replaced by IPv6, 128-bit addresses.
At work networks, IPs often belong to the same starting numbers. Home IP addresses are private IPs, following 10.#.#.#, etc. Private IPs can be converted by a router to public IPs. To a website, everyone in one public IP are the same person.
Websites (servers) have IP addresses to. The domain name system assigns names to IP addresses. You can buy (rent) a domain name for your website.
You can look up an IP address for a website with nslookup <website>.
nslookup google.com## Server: 2001:558:feed::1
## Address: 2001:558:feed::1#53
##
## Non-authoritative answer:
## Name: google.com
## Address: 216.58.192.78
Most large websites have several IPs, distributed across servers to reduce load.
A router is a device that moves data from point A to point B. The router can direct data to other places, serving as hubs for networks.
You can see the router’s path to somewhere else with traceroute <website>. Normally this does three queries, you can use the option -q to set how many queries you want.
traceroute -q 1 google.com## traceroute to google.com (216.58.192.110), 64 hops max, 52 byte packets
## 1 10.0.0.1 (10.0.0.1) 1.526 ms
## 2 96.120.85.41 (96.120.85.41) 9.823 ms
## 3 po91-sr01.wilmington.ga.savannah.comcast.net (68.85.93.129) 9.862 ms
## 4 te-0-0-0-4-ur04.savannah.ga.savannah.comcast.net (68.86.250.193) 9.303 ms
## 5 ae-20-ar02.southside.fl.jacksvil.comcast.net (68.87.165.13) 19.706 ms
## 6 be-33489-cr02.miami.fl.ibone.comcast.net (68.86.95.45) 21.658 ms
## 7 hu-0-11-0-4-pe01.nota.fl.ibone.comcast.net (68.86.82.6) 20.833 ms
## 8 as15169-2-c.nota.fl.ibone.comcast.net (66.208.228.98) 43.862 ms
## 9 209.85.253.120 (209.85.253.120) 34.043 ms
## 10 216.239.42.79 (216.239.42.79) 20.724 ms
## 11 mia07s35-in-f110.1e100.net (216.58.192.110) 21.317 ms
To get to google.com, we go to Comcast’s Wilmington Island server, to Savannah, to Jacksonville, to Miami, to Nota? And then we stay in Florida.
traceroute -q 1 www.cnn.co.jp## traceroute: Warning: www.cnn.co.jp has multiple addresses; using 14.0.36.197
## traceroute to p11269.cdngc.net (14.0.36.197), 64 hops max, 52 byte packets
## 1 10.0.0.1 (10.0.0.1) 1.872 ms
## 2 96.120.85.41 (96.120.85.41) 10.112 ms
## 3 po91-sr01.wilmington.ga.savannah.comcast.net (68.85.93.129) 9.954 ms
## 4 xe-7-1-3-0-ar05.savannah.ga.savannah.comcast.net (68.86.250.21) 14.863 ms
## 5 162.151.19.50 (162.151.19.50) 12.338 ms
## 6 10g-8-1-ur01.southsiderdc.fl.jacksvil.comcast.net (68.86.168.49) 14.207 ms
## 7 be-33489-cr02.56marietta.ga.ibone.comcast.net (68.86.95.49) 28.403 ms
## 8 be-11424-cr02.dallas.tx.ibone.comcast.net (68.86.85.22) 47.751 ms
## 9 be-11524-cr02.losangeles.ca.ibone.comcast.net (68.86.87.173) 78.938 ms
## 10 be-10915-cr01.sunnyvale.ca.ibone.comcast.net (68.86.86.97) 85.455 ms
## 11 hu-0-13-0-0-pe02.529bryant.ca.ibone.comcast.net (68.86.86.94) 86.843 ms
## 12 124.215.192.125 (124.215.192.125) 93.152 ms
## 13 pajbb002.int-gw.kddi.ne.jp (111.87.3.53) 85.341 ms
## 14 otejbb205.int-gw.kddi.ne.jp (203.181.100.1) 180.145 ms
## 15 tm4bbac02.bb.kddi.ne.jp (125.53.98.86) 196.878 ms
## 16 182.248.216.214 (182.248.216.214) 188.639 ms
## 17 14.0.32.170 (14.0.32.170) 188.386 ms
## 18 14.0.36.197 (14.0.36.197) 179.944 ms
Wow, this is a long one. First, Wilmington Island, then Savannah, then Jacksonville, then Marietta (near Atlanta), then Dallas, then LA, then Bryant (somewhere in California, I presume), and finally we get to Japan! We crossed the Ocean, thanks to those undersea fiber-optic cables.
What is data anyways? It isn’t a stream of bytes…it’s a packet! The data is split into small chunks, and sends them in what are equivalent to virtual envelopes. They have to-addresses, from-addresses, and a number, enough to reconstruct the entire set of data.
What if a packet doesn’t get to its destination? This is where TCP/IP comes in, a combination of two protocols, IP, and TCP. TCP is the protocol that ensures a packet reaches its destination.
Having just the to- and from-address is not enough. We also need the port number, the service for opening the packet.
| Number | Port | Usage |
|---|---|---|
| 21 | FTP | files |
| 25 | SMTP | |
| 53 | DNS | |
| 80 | HTTP | web |
| 443 | HTTPS | web |
Your browser infers ports for websites but you can be explicit by appending :80 or :443.
A firewall routes all traffic through one device (eg. a router). One kind of firewall can serve the incorrect IP address from the DNS server. To bypass this, you can use a different DNS (Google’s 8.8.8.8) or type in the IP address of the blocked site. However, now Google sees your browsing history.
Modern firewalls can block certains ports and IPs.
On public wifi addresses, packets can be accessed by anyone on the network. This can be used for malicious purposes. To encrypt your packets, you can use a VPN, virtual private network.
You do sacrifice speed for security, though. Your packets have to be routed via another server and therfore through more waypoints.
Why is this useful to know?
Well, HTTP is a protocol for accesssing websites, a set of conventions.
HTTP
Some operations we perform with HTTP are:
HTTP/1.1 200 OK
Content-Type: text/html
We rarely see 200 as return number because it is good; on the other hand, 404 is “file not found”.
| Code | Meaning |
|---|---|
| 200 | OK |
| 301 | Moved permanently (redirect) |
| 302 | Found (redirect) |
| 401 | Unauthorized |
| 403 | Forbidden |
| 404 | Not Found |
| 500 | Internal Server Error |
The content-type is what kind of file is returned.
You send key value pairs in the URL as requests. For example, https://www.google.com/search?q=hello. What if you want to send an image? Or a password?
POST /login.php HTTP/1/1
Host: www.facebook.com
...
email=malan@harvard.edu&password=12345
The password can be hidden deeper in the “envelope”. Furthermore, HTTPS encrypts this.
Woohoo! We made it!
<!DOCTYPE html>
<html>
<head>
<title>hello, world</title>
</head>
<body>
hello, world
</body>
</htmlHTML isn’t a programming language; it’s a markup language for websites.
Another language we have is CSS, a styling language. After all, black text on a white background isn’t very appealing.